A Genetic Algorithm Assisted Hybrid Approach to
نویسندگان
چکیده
Heterogeneity and interoperability of Web data sources represent the current key issue in Web information extraction and integration. Warehouse approach and virtual approach are the common approaches adopted to integrate heterogeneous Web data sources. However, few analytic model and cost model were developed to measure and assess the efficiency and effectiveness of either approach or a combination. Hence, a contingency model cannot be produced to assist the search engine to select and mix the warehouse method and the virtual method. In this study, we present a genetic algorithm assisted hybrid approach to aid the search engine to evaluate the cost and performance factors. We apply genetic algorithm technique to formulate a cost optimization model and compute and compare the cost of extraction and integration. The cost model is based on a collection and compilation of the property data of the query analysis and path expression of the involved Web data sources. Six property analyses are conducted and six evolution steps are created to formulate the genetic algorithm of optimization. Further, we conduct a preliminary experiment using 15 local and global Web bookstores to install and test the method. Our experimental results show that the cost optimization can be achieved with the genetic algorithm and factor analysis. INTRODUCTION World Wide Web (WWW or Web) technology significantly changes the business and personal information computing environment. It alters the way the business and individuals search and secure data. It changes the way the data is accessed and presented. With the wireless technology emerging, Web information extraction and integration over the mobile computing platform gives the compelling reason to tackle the heterogeneity and interoperability issue with quantitative measure of cost and performance. Further, business and individuals spend ever more time and efforts searching and surfing the Web network. People place ever more attention and resource on the Web media to communicate and disseminate data and information. Heterogeneity and interoperability, as in the distributed computing, becomes one of the key issues in Web information extraction and integration. In this paper, we present a genetic algorithm assisted method to tackle the research issue. We develop a genetic algorithm assisted optimal cost model and contingency model. We formulate a cost optimization with factor analysis to suggest a quantitative measure to select or combine the warehouse approach and the virtual approach. Six property analyses and six evolution steps are generated. Continuous query mechanism is applied to segment query request and assemble query response. We conduct a preliminary experiment using a standard set of local and global Internet bookstores to install and test the method. Experimental results show that the optimal cost can be achieved on a dynamic process. The exclusive use and hybrid use of either approach can be determined. This paper is organized into five sections. Section one introduces this research. Section two surveys the warehouse approach and the virtual approach. Section three reviews the genetic algorithm. Section four presents the cost optimization model and factor analysis. Section five describes the preliminary experiments. Section six concludes this paper with a discussion and a brief summary. WEB INFORMATION EXTRACTION AND INTEGRATION Web information extraction and integration is to process the Web query and operation from single or multiple Web data sources. Various researches including (Etzioni et al., 1994; Woelk et al., 1995; Arens et al., 1996; Levy et al., 1996; Garcia-Molina et al., 1997; Duschka et al., 1997; Friedman et al., 1997; Ambite et al., 1998; Beeri
منابع مشابه
A novel hybrid genetic algorithm to solve the make-to-order sequence-dependent flow-shop scheduling problem
Flow-shop scheduling problem (FSP) deals with the scheduling of a set of n jobs that visit a set of m machines in the same order. As the FSP is NP-hard, there is no efficient algorithm to reach the optimal solution of the problem. To minimize the holding, delay and setup costs of large permutation flow-shop scheduling problems with sequence-dependent setup times on each machine, this pap...
متن کاملThe hybrid approach based on genetic algorithm and neural network to predict financial fraud in banks
Audit has become an essential topic in the world because there is much evidence about deliberate manipulations in the reports. One of the concerns in the field of audit is discovery and search of the financial statements and the high volume of bulk data. This study tried to suggest the appropriate method to detect these frauds due to the data which has been available and a proposed algorithm. R...
متن کاملThe hybrid approach based on genetic algorithm and neural network to predict financial fraud in banks
Audit has become an essential topic in the world because there is much evidence about deliberate manipulations in the reports. One of the concerns in the field of audit is discovery and search of the financial statements and the high volume of bulk data. This study tried to suggest the appropriate method to detect these frauds due to the data which has been available and a proposed algorithm. R...
متن کاملScheduling of a flexible flow shop with multiprocessor task by a hybrid approach based on genetic and imperialist competitive algorithms
This paper presents a new mathematical model for a hybrid flow shop scheduling problem with multiprocessor tasks in which sequence dependent set up times and preemption are considered. The objective is to minimize the weighted sum of makespan and maximum tardiness. Three meta-heuristic methods based on genetic algorithm (GA), imperialist competitive algorithm (ICA) and a hybrid approach of GA a...
متن کاملA New Hybrid Meta-Heuristics Approach to Solve the Parallel Machine Scheduling Problem Considering Human Resiliency Engineering
This paper proposes a mixed integer programming model to solve a non-identical parallel machine (NIPM) scheduling with sequence-dependent set-up times and human resiliency engineering. The presented mathematical model is formulated to consider human factors including Learning, Teamwork and Awareness. Moreover, processing time of jobs are assumed to be non-deterministic and dependent to their st...
متن کاملParametric optimization of cylindrical grinding process through hybrid Taguchi method and RSM approach using genetic algorithm
The present investigation proposes a hybrid technique: Taguchi method, response surface methodology (RSM) and genetic algorithm (GA), to analyze, model and predict vibration and surface roughness in traverse cut cylindrical grinding of aluminum alloy. Experiments have been conducted as per L9 orthogonal array of Taguchi methodology using several levels of the grinding parameters. Analysis of va...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004